NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems

Klisura, Dorde; Rios, Anthony (April 2025, Findings of the Association for Computational Linguistics: NAACL 2025)

Text-to-SQL systems empower users to interact with databases using natural language, automatically translating queries into executable SQL code. However, their reliance on database schema information for SQL generation exposes them to significant security vulnerabilities, particularly schema inference attacks that can lead to unauthorized data access or manipulation. In this paper, we introduce a novel zero-knowledge framework for reconstructing the underlying database schema of text-to-SQL models without any prior knowledge of the database. Our approach systematically probes text-to-SQL models with specially crafted questions and leverages a surrogate GPT-4 model to interpret the outputs, effectively uncovering hidden schema elements—including tables, columns, and data types. We demonstrate that our method achieves high accuracy in reconstructing table names, with F1 scores of up to .99 for generative models and .78 for fine-tuned models, underscoring the severity of schema leakage risks. We also show that our attack can steal prompt information in non-text-to-SQL models. Furthermore, we propose a simple protection mechanism for generative models and empirically show its limitations in mitigating these attacks.
more » « less
Free, publicly-accessible full text available April 1, 2026
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

Ford, James; Zhao, Xingmeng; Schumacher, Dan; Rios, Anthony (May 2025, Proceedings of the 31st International Conference on Computational Linguistics)
Rambow, Owen; Wanner, Leo; Apidianaki, Marianna; Khalifa, Hend; Eugenio, Barbara; Schockaert, Steven (Ed.)
We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI’s GPT-3.5 Turbo and Meta’s Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.
more » « less
Free, publicly-accessible full text available May 2, 2026
Bike Frames: Understanding the Implicit Portrayal of Cyclists in the News

Zhao, Xingmeng; Schumacher, Dan; Nalluri, Sashank; Walton, Xavier; Shrestha, Suhana; Rios, Anthony (May 2025, Proceedings of the International AAAI Conference on Web and Social Media)

Increasing cycling for transportation or recreation can boost health and reduce the environmental impacts of vehicles. However, news agencies' ideologies and reporting styles often influence public perception of cycling. For example, if news agencies overly report cycling accidents, it may make people perceive cyclists as "dangerous," reducing the number of cyclists who opt to cycle. Additionally, a decline in cycling can result in less government funding for safe infrastructure. In this paper, we develop a method for detecting the perceived perception of cyclists within news headlines. We introduce a new dataset called ``Bike Frames'' to accomplish this. The dataset consists of 31,480 news headlines and 1,500 annotations. Our focus is on analyzing 11,385 headlines from the United States. We also introduce the BikeFrame Chain-of-Code framework to predict cyclist perception, identify accident-related headlines, and determine fault. This framework uses pseudocode for precise logic and integrates news agency bias analysis for improved predictions over traditional chain-of-thought reasoning in large language models. Our method substantially outperforms other methods, and most importantly, we find that incorporating news bias information substantially impacts performance, improving the average F1 from .739 to .815. Finally, we perform a comprehensive case study on US-based news headlines, finding reporting differences between news agencies and cycling-specific websites as well as differences in reporting depending on the gender of cyclists. WARNING: This paper contains descriptions of accidents and death.
more » « less
Free, publicly-accessible full text available May 4, 2026
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations

Ford, James; Zhao, Xingmeng; Schumacher, Dan; Rios, Anthony (January 2025, Proceedings of the International Conference on Computational Linguistics)

We propose a novel framework that leverages Visual Question Answering (VQA) models to automate the evaluation of LLM-generated data visualizations. Traditional evaluation methods often rely on human judgment, which is costly and unscalable, or focus solely on data accuracy, neglecting the effectiveness of visual communication. By employing VQA models, we assess data representation quality and the general communicative clarity of charts. Experiments were conducted using two leading VQA benchmark datasets, ChartQA and PlotQA, with visualizations generated by OpenAI’s GPT-3.5 Turbo and Meta’s Llama 3.1 70B-Instruct models. Our results indicate that LLM-generated charts do not match the accuracy of the original non-LLM-generated charts based on VQA performance measures. Moreover, while our results demonstrate that few-shot prompting significantly boosts the accuracy of chart generation, considerable progress remains to be made before LLMs can fully match the precision of human-generated graphs. This underscores the importance of our work, which expedites the research process by enabling rapid iteration without the need for human annotation, thus accelerating advancements in this field.
more » « less
Full Text Available
Beyond Text-to-SQL for IoT Defense: A Comprehensive Framework for Querying and Classifying IoT Threats

Pavlich, Ryan; Ebadi, Nima; Tarbell, Richard; Linares, Billy; Tan, Adrian; Humphreys, Rachael; Das, Jayanta; Ghandiparsi, Rambod; Haley, Hannah; George, Jerris; et al (May 2025, Proceedings of the Workshop on Trustworthy NLP (TrustNLP 2025@NAACL))

Recognizing the promise of natural language interfaces to databases, prior studies have emphasized the development of text-to-SQL systems. Existing research has generally focused on generating SQL statements from text queries, and the broader challenge lies in inferring new information about the returned data. Our research makes two major contributions to address this gap. First, we introduce a novel Internet-of-Things (IoT) text-to-SQL dataset comprising 10,985 text-SQL pairs and 239,398 rows of network traffic activity. The dataset contains additional query types limited in prior text-to-SQL datasets, notably, temporal-related queries. Our dataset is sourced from a smart building’s IoT ecosystem exploring sensor read and network traffic data. Second, our dataset allows two-stage processing, where the returned data (network traffic) from a generated SQL can be categorized as malicious or not. Our results show that joint training to query and infer information about the data improves overall text-to-SQL performance, nearly matching that of substantially larger models. We also show that current large language models (e.g., GPT3.5) struggle to infer new information about returned data (i.e., they are bad at tabular data understanding), thus our dataset provides a novel test bed for integrating complex domain-specific reasoning into LLMs.
more » « less
Free, publicly-accessible full text available May 1, 2026
Translating Natural Language Specifications into Access Control Policies by Leveraging Large Language Models

https://doi.org/10.1109/TPS-ISA62245.2024.00048

Lawal, Sherifdeen; Zhao, Xingmeng; Rios, Anthony; Krishnan, Ram; Ferraiolo, David (October 2024, IEEE)

Full Text Available
Can GPT-4 Detect Euphemisms across Multiple Languages?

Firsich, Todd; Rios, Anthony (June 2024, Proceedings of the Fourth Workshop on Figurative Language Processing (FigLang))

Full Text Available
Team UTSA-NLP at SemEval 2024 Task 5: Prompt Ensembling for Argument Reasoning in Civil Procedures with GPT4

Schumacher, Dan; Rios, Anthony (June 2024, Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024))

Full Text Available
UTSA-NLP at ChemoTimelines 2024: Evaluating Instruction-Tuned Language Models for Temporal Relation Extraction

Zhao, Xingmeng; Rios, Anthony (June 2024, Proceedings of the 6th Clinical Natural Language Processing Workshop)
A Comprehensive Study of Gender Bias in Chemical Named Entity Recognition Models

Zhao, Xingmeng; Niazi, Ali; Rios, Anthony (June 2024, Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies)

Full Text Available

« Prev Next »

Search for: All records